<h1 id="customjson">Custom JSON formats</h1>

<h2>ECMAScript JSON shortcomings</h2>

<p>The standard JSON format has a number of shortcomings when used with
ECMAScript:</p>
<ul>
<li><code>undefined</code> and function values are not supported</li>
<li>NaN and infinity values are not supported</li>
<li>Duktape custom types are, of course, not supported</li>
<li>Codepoints above BMP cannot be represented except as surrogate pairs</li>
<li>Codepoints above U+10FFFF cannot be represented even as surrogate pairs</li>
<li>The output is not printable ASCII which is often inconvenient</li>
</ul>

<p>These limitations are part of the ECMAScript specification which
explicitly prohibits more lenient behavior.  Duktape provides two more
programmer friendly custom JSON format variants: <b>JX</b> and <b>JC</b>,
described below.</p>

<h2>Custom JX format</h2>

<p>JX encodes all values in a very readable manner and parses back
almost all values in a faithful manner (function values being the most
important exception).  Output is pure printable ASCII, codepoints above
U+FFFF are encoded with a custom escape format, and quotes around object
keys are omitted in most cases.  JX is not JSON compatible but a very
readable format, most suitable for debugging, logging, etc.</p>

<p>JX is used as follows:</p>
<pre class="ecmascript-code">
var obj = { foo: 0/0, bar: [ 1, undefined, 3 ] };
print(Duktape.enc('jx', obj));
// prints out: {foo:NaN,bar:[1,undefined,3]}

var dec = Duktape.dec('jx', '{ foo: 123, bar: undefined, quux: NaN }');
print(dec.foo, dec.bar, dec.quux);
// prints out: 123 undefined NaN
</pre>

<h2>Custom JC format</h2>

<p>JC encodes all values into standard JSON.  Values not supported by
standard JSON are encoded as objects with a marker key beginning with an
underscore (e.g. <code>{"_ptr":"0xdeadbeef"}</code>).  Such values parse
back as ordinary objects.  However, you can revive them manually more or
less reliably.  Output is pure printable ASCII; codepoints above U+FFFF
are encoded as plain string data with the format "U+nnnnnnnn"
(e.g. <code>U+0010fedc</code>).</p>

<p>JC is used as follows:</p>
<pre class="ecmascript-code">
var obj = { foo: 0/0, bar: [ 1, undefined, 3 ] };
print(Duktape.enc('jc', obj));
// prints out: {"foo":{"_nan":true},"bar":[1,{"_undef":true},3]}

var dec = Duktape.dec('jc', '{ "foo": 123, "bar": {"_undef":true}, "quux": {"_nan":true} }');
print(dec.foo, dec.bar, dec.quux);
// prints out: 123 [object Object] [object Object]
</pre>

<p>The JC decoder is essentially the same as the standard JSON decoder
at the moment: all JC outputs are valid JSON and no custom syntax is needed.
As shown in the example, custom values (like <code>{"_undef":true}</code>)
are <b>not</b> revived automatically.  They parse back as ordinary objects
instead.</p>

<h2>Codepoints above U+FFFF and invalid UTF-8 data</h2>

<p>All standard ECMAScript strings are valid CESU-8 data internally, so
behavior for codepoints above U+FFFF never poses compliance issues.  However,
Duktape strings may contain <a href="#extended-utf8">extended UTF-8</a>
codepoints and may even contain invalid UTF-8 data.</p>

<p>The Duktape JSON implementation, including the standard ECMAScript JSON API,
use replacement characters to deal with invalid UTF-8 data.  The resulting
string may look a bit odd, but this behavior is preferable to throwing an
error.</p>

<h2>JSON format examples</h2>

<p>The table below summarizes how different values encode in each
encoding:</p>

<table>
<thead>
<tr>
<th>Value</th>
<th>Standard JSON</th>
<th>JX</th>
<th>JC</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>undefined</td>
<td>n/a</td>
<td><code>undefined</code></td>
<td><code>{"_undef":true}</code></td>
<td>Standard JSON: encoded as <code>null</code> inside arrays, otherwise omitted</td>
</tr>
<tr>
<td>null</td>
<td><code>null</code></td>
<td><code>null</code></td>
<td><code>null</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>true</td>
<td><code>true</code></td>
<td><code>true</code></td>
<td><code>true</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>false</td>
<td><code>false</code></td>
<td><code>false</code></td>
<td><code>false</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>123.4</td>
<td><code>123.4</code></td>
<td><code>123.4</code></td>
<td><code>123.4</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>+0</td>
<td><code>0</code></td>
<td><code>0</code></td>
<td><code>0</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>-0</td>
<td><code>0</code></td>
<td><code>-0</code></td>
<td><code>-0</code></td>
<td>Standard JSON allows <code>-0</code> but serializes negative
    zero as <code>0</code> (losing the sign unnecessarily)</td>
</tr>
<tr>
<td>NaN</td>
<td><code>null</code></td>
<td><code>NaN</code></td>
<td><code>{"_nan":true}</code></td>
<td>Standard JSON: always encoded as <code>null</code></td>
<td></td>
</tr>
<tr>
<td>Infinity</td>
<td><code>null</code></td>
<td><code>Infinity</code></td>
<td><code>{"_inf":true}</code></td>
<td>Standard JSON: always encoded as <code>null</code></td>
</tr>
<tr>
<td>-Infinity</td>
<td><code>null</code></td>
<td><code>-Infinity</code></td>
<td><code>{"_ninf":true}</code></td>
<td>Standard JSON: always encoded as <code>null</code></td>
</tr>
<tr>
<td>"k&#xf6;h&#xe4;"</td>
<td><code>"k&#xf6;h&#xe4;"</code></td>
<td><code>"k\xf6h\xe4"</code></td>
<td><code>"k\u00f6h\u00e4"</code></td>
<td></td>
</tr>
<tr>
<td>U+00FC</td>
<td><code>"\u00fc"</code></td>
<td><code>"\xfc"</code></td>
<td><code>"\u00fc"</code></td>
<td></td>
</tr>
<tr>
<td>U+ABCD</td>
<td><code>"\uabcd"</code></td>
<td><code>"\uabcd"</code></td>
<td><code>"\uabcd"</code></td>
<td></td>
</tr>
<tr>
<td>U+1234ABCD</td>
<td><code>"U+1234abcd"</code></td>
<td><code>"\U1234abcd"</code></td>
<td><code>"U+1234abcd"</code></td>
<td>Non-BMP characters are not standard ECMAScript, JX format borrowed from Python</td>
</tr>
<tr>
<td>object</td>
<td><code>{"my_key":123}</code></td>
<td><code>{my_key:123}</code></td>
<td><code>{"my_key":123}</code></td>
<td>ASCII keys matching identifer requirements encoded without quotes in JX</td>
</tr>
<tr>
<td>array</td>
<td><code>["foo","bar"]</code></td>
<td><code>["foo","bar"]</code></td>
<td><code>["foo","bar"]</code></td>
<td></td>
</tr>
<tr>
<td>buffer</td>
<td><code>n/a</code></td>
<td><code>|deadbeef|</code></td>
<td><code>{"_buf":"deadbeef"}</code></td>
<td></td>
</tr>
<tr>
<td>pointer</td>
<td><code>n/a</code></td>
<td><code>(0xdeadbeef)<br />(DEADBEEF)</code></td>
<td><code>{"_ptr":"0xdeadbeef"}<br />{"_ptr":"DEADBEEF"}</code></td>
<td>Representation inside parentheses or quotes is platform specific</td>
</tr>
<tr>
<td>NULL pointer</td>
<td><code>n/a</code></td>
<td><code>(null)</code></td>
<td><code>{"_ptr":"null"}</code></td>
<td></td>
</tr>
<tr>
<td>function</td>
<td><code>n/a</code></td>
<td><code>{_func:true}</code></td>
<td><code>{"_func":true}</code></td>
<td>Standard JSON: encoded as <code>null</code> inside arrays, otherwise omitted</td>
</tr>
<tr>
<td>lightfunc</td>
<td><code>n/a</code></td>
<td><code>{_func:true}</code></td>
<td><code>{"_func":true}</code></td>
<td>Formats like ordinary functions</td>
</tr>
</tbody>
</table>

<h2>Limitations</h2>

<p>Some limitations include:</p>

<ul>
<li>Only enumerable own properties are serialized in any of the formats.</li>
<li>Array properties (other than the entries) are not serialized.  This would
    be useful in e.g. logging, e.g. as <code>[1,2,3,"type":"point"]</code>.</li>
<li>Array gaps are not preserved, and parse back as <code>undefined</code>.</li>
<li>There is no automatic revival of special values when parsing JC data.</li>
<li>There is no canonical encoding.  This would be easy to arrange with a simple
    option to sort object keys during encoding.</li>
</ul>

<p>(See internal documentation for more future work issues.)</p>

